Hierarchical Translation Equivalence over Word Alignments
نویسندگان
چکیده
We present a theory of word alignments in machine translation (MT) that equips every word alignment with a hierarchical representation with exact semantics defined over the translation equivalence relations known as hierarchical phrase pairs. The hierarchical representation consists of a set of synchronous trees (called Hierarchical Alignment Trees – HATs), each specifying a bilingual compositional build up for a given word aligned, translation equivalent sentence pair. Every HAT consists of a single tree with nodes decorated with local transducers that conservatively generalize the asymmetric bilingual trees of Inversion Transduction Grammar (ITG). The HAT representation is proven semantically equivalent to the word alignment it represents, and minimal (among the semantically equivalent alternatives) because it densely represents the subsumption order between pairs of (hierarchical) phrase pairs. We present an algorithm that interprets every word alignment as a semantically equivalent set of HATs, and contribute an empirical study concerning the exact coverage of subclasses of HATs that are semantically equivalent to subclasses of manual and automatic word alignments.
منابع مشابه
Visualization, Search and Analysis of Hierarchical Translation Equivalence in Machine Translation Data
Translation equivalence constitutes the basis of all Machine Translation systems including the recent hierarchical and syntax-based systems. For hierarchical MT research it is important to have a tool that supports the qualitative and quantitative analysis of hierarchical translation equivalence relations extracted from word alignments in data. In this paper we present such a toolkit and exempl...
متن کاملA combination of hierarchical systems with forced alignments from phrase-based systems
Currently most state-of-the-art statistical machine translation systems present a mismatch between training and generation conditions. Word alignments are computed using the well known IBM models for single-word based translation. Afterwards phrases are extracted using extraction heuristics, unrelated to the stochastic models applied for finding the word alignment. In the last years, several re...
متن کاملHierarchical Search for Word Alignment
We present a simple yet powerful hierarchical search algorithm for automatic word alignment. Our algorithm induces a forest of alignments from which we can efficiently extract a ranked k-best list. We score a given alignment within the forest with a flexible, linear discriminative model incorporating hundreds of features, and trained on a relatively small amount of annotated data. We report res...
متن کاملHierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities
We report on investigations into hierarchical phrase-based translation grammars based on rules extracted from posterior distributions over alignments of the parallel text. Rather than restrict rule extraction to a single alignment, such as Viterbi, we instead extract rules based on posterior distributions provided by the HMM word-to-word alignment model. We define translation grammars progressi...
متن کاملEquivalence in Technical Texts: The Case of Accounting Terms in English-Persian Dictionaries
Translating accounting documents, in general, and accounting terminology, in particular, is not a simple task, especially when the new terms keep created in pace with accounting developments. This study was carried out to find the most common and preferable ways to translate accounting terms from English into Persian. Also, an attempt was made to identify the frequently used patterns of word-fo...
متن کامل